60

6

The Nature of Information

and much of science consists in determining them; thus, in a sense, “constraint” is

synonymous with “regularity.” Laws of nature are clearly constraints, and the very

existence of physical objects such as tables and aeroplanes, which have fewer degrees

of freedom than their constituent parts considered separately, is a manifestation of

constraint.

In this book we are particularly concerned with constraints applied to sequences.

Clearly, if a Markov process is in operation, the variety of the set of possible sequences

generated from a particular alphabet is smaller than it would be had successive

symbols been freely selected; that is, it is indeed “smaller than it might have been”.

“Might have been” requires the qualification, then, of “would have been if successive

symbols had been freely (or randomly—leaving the discussion of ‘randomness’ to

Chap. 11) selected”. We already know how to calculate the entropy (or information,

or Shannon index, or Shannon–Weaver index) upper II of a random sequence (Eq. 6.5);

there is a precise way of calculating the entropy per symbol for a Markov process

(see Sect. 11.2), and the reader may use the formula derived there to verify that the

entropy of a Markov process is less than that of a “perfectly random” process. Using

some of the terminology already introduced, we may expand on this statement to say

that the surprise occasioned by receiving a piece of information is lower if constraint

is operating; for example, when spelling out a word, it is practically superfluous to

say “u” after “q.”

The constraints affecting the choice of successive words are a manifestation of

the syntax of a language. 14 In the next chapter other ways in which constraint can

operate will be examined, but for now we can simply state that whenever constraint is

present, the entropy (of the set we are considering, hence of the information received

by selecting a member of that set) is lower than it would be for a perfectly random

selection from that set.

This maximum entropy (which, in physical systems, corresponds to the most

probable arrangement; i.e., to the macroscopic state that can be arranged in the

largest number of ways)—let us call it upper I Subscript normal m normal a normal xImax—allows us to define a relative entropy

upper I Subscript relIrel,

upper I Subscript rel Baseline equals StartFraction actual entropy Over upper I Subscript normal m normal a normal x Baseline EndFraction commaIrel = actual entropy

Imax

,

(6.17)

and a redundancy upper RR,

upper R equals 1 minus upper I Subscript rel Baseline periodR = 1Irel .

(6.18)

In a fascinating piece of work, Shannon (1951) established the entropy of English

essentially through empirical investigations using rooms full of people trying to guess

incomplete texts. 15

14 Animal communication is typically non-syntactic; the vast expressive power of human language

would be impossible without syntax, which could be thought of as the combination of discrete

components in, potentially, infinite ways. Nowak et al. (2000) have suggested that syntax could

only evolve if the number of discrete components exceeds a threshold.

15 Note that most computer languages lack redundancy—a single wrong character in a program

will usually cause the program to halt, or not compile.